The ABC recently publish an interesting piece of data-journalism exploring “what makes Fremantle Dockers captain Nat Fyfe so good?” The authors put forth a compelling, data-driven case for the specific attributes that make Nathan Fyfe stand out from other players
Below is an eye-catching visualization from the article:
The scatterplot above depicts contested possessions (x-axis) by contested marks (y-axis). Each blue data-point is a player’s statistical average on these metrics for a single season
It shows between 2014 and 2019, Nat Fyfe was something of a statistical anomaly - exhibiting a rare combination of elite contested ball winning ability and aerial dominance. No other player in the competition was able to combine these two traits to such a level
To this end, the authors ordain Nat Fyfe as the The Flying Bulldozer
This got me thinking what other players demonstrate anomalously good skill-profiles for specific combinations of metrics?
A flexible way to answer this question is via an interactive shiny application that loosely replicates the above scatterplot, but also allows end-users to input match-day metrics of their choosing
This walkthrough will cover how to do this, summarizing:
- Sourcing, preparing & modeling of data
- Extracting additional features such as # of games
- Launching & programming the Shiny app
- Sharing some of my own examples of player/metric combinations that stood out
Let’s dive in!
The Data
As per my previous post, I have utilized the “FitzRoy” package to extract data from AFLTables & Footywire
Referencing this useful link by the Fitzory package author, I was able to find a faster, more data-rich method to source this data
In my previous post, extracting data was relatively time-consuming, & required imputation of match_ids corresponding to the range of games
The below code snippet only requires the year(s) of interest to be imputed
library(dplyr)
library(tidyr)
library(ggplot2)
library(plotly)
library(fitzRoy)
library(shiny)
library(shinyWidgets)
library(ggthemes)
library(tableHTML)
library(kableExtra)
library(knitr)The structure of this extract includes all AFL premiership season matches from 2017 to the present. Each individual match is associated with 44 rows, one for each player who participated in the game. Over four seasons of football, this accumulates to 33,264 rows of data
Re-shape the data to player averages
Two key steps required to transform this data are:
- Aggregation so each row reflects statistical averages for each player
- Calculation of other salient player details - specifically players current team, # of games & field position
To make things more manageable, let’s filter to only the fields of interest
afl.2 <- afl.1 %>%
select(# player details
player_id,
player_first_name,
player_last_name,
# player statistics
kicks,
handballs,
disposals,
disposal_efficiency_percentage,
goals,
behinds,
hitouts,
tackles,
rebounds,
inside_fifties,
clearances,
clangers,
free_kicks_for,
free_kicks_against,
contested_possessions,
uncontested_possessions,
brownlow_votes,
contested_marks,
marks_inside_fifty,
one_percenters,
bounces,
goal_assists,
time_on_ground_percentage,
centre_clearances,
stoppage_clearances,
score_involvements,
metres_gained,
turnovers,
intercepts,
tackles_inside_fifty,
contest_def_losses,
contest_def_one_on_ones,
contest_off_one_on_ones,
contest_off_wins,
def_half_pressure_acts,
effective_kicks,
f50_ground_ball_gets,
ground_ball_gets,
hitouts_to_advantage,
intercept_marks,
pressure_acts,
ruck_contests,
score_launches,
spoils
)& calculate the average match-day statistics for each player
This is achieved by grouping the data by player, then using summarise_at to calculate the arithmetic average (i.e. mean) for all 43 match-day statistics
# Aggregate
player <- afl.2 %>%
group_by(player_id,
player_first_name,
player_last_name) %>%
summarise_at(.vars = colnames(.)[4:47], mean) # calculate mean for all stats
# Ensure table is dataframe
player <- as.data.frame(player)
# round numeric columns to 2 decimal places
player <- player %>% mutate_if(is.numeric, ~round(., 2))Which produces the below table:
| player_id | player_first_name | player_last_name | kicks | handballs | disposals | disposal_efficiency_percentage | goals | behinds | hitouts | tackles | rebounds | inside_fifties | clearances | clangers | free_kicks_for | free_kicks_against | contested_possessions | uncontested_possessions | brownlow_votes | contested_marks | marks_inside_fifty | one_percenters | bounces | goal_assists | time_on_ground_percentage | centre_clearances | stoppage_clearances | score_involvements | metres_gained | turnovers | intercepts | tackles_inside_fifty | contest_def_losses | contest_def_one_on_ones | contest_off_one_on_ones | contest_off_wins | def_half_pressure_acts | effective_kicks | f50_ground_ball_gets | ground_ball_gets | hitouts_to_advantage | intercept_marks | pressure_acts | ruck_contests | score_launches | spoils |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11084 | Robert | Murphy | 12.24 | 7.53 | 19.76 | 77.00 | 0.59 | 0.41 | 0.00 | 2.12 | 1.47 | 4.06 | 0.71 | 2.18 | 0.59 | 0.71 | 4.41 | 15.24 | 0.00 | 0.24 | 0.53 | 0.94 | 0.82 | 0.71 | 81.53 | 0.12 | 0.53 | 5.29 | 378.53 | 4.18 | 2.59 | 0.29 | 0.24 | 0.53 | 0.76 | 0.47 | 4.18 | 9.12 | 0.35 | 3.53 | 0.00 | 0.41 | 8.71 | 0.00 | 0.94 | 0.65 |
| 11095 | Drew | Petrie | 8.25 | 3.81 | 12.06 | 67.56 | 1.00 | 0.19 | 12.19 | 3.31 | 0.81 | 1.75 | 1.75 | 2.44 | 1.50 | 1.38 | 7.00 | 6.00 | 0.00 | 1.31 | 1.00 | 4.06 | 0.00 | 0.25 | 81.94 | 0.69 | 1.06 | 3.75 | 196.12 | 2.62 | 1.94 | 0.81 | 0.19 | 0.62 | 1.75 | 0.56 | 5.31 | 5.06 | 0.38 | 3.31 | 3.88 | 0.62 | 12.94 | 34.88 | 1.06 | 3.38 |
| 11139 | Scott | Thompson | 5.00 | 5.00 | 10.00 | 80.00 | 0.00 | 0.00 | 0.00 | 5.00 | 1.00 | 4.00 | 1.00 | 5.00 | 0.00 | 5.00 | 5.00 | 6.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | 1.00 | 72.00 | 1.00 | 0.00 | 3.00 | 220.00 | 4.00 | 4.00 | 0.00 | 0.00 | 0.00 | 1.00 | 0.00 | 14.00 | 4.00 | 1.00 | 4.00 | 0.00 | 1.00 | 24.00 | 0.00 | 1.00 | 0.00 |
| 11153 | Nick | Riewoldt | 10.94 | 5.88 | 16.82 | 76.00 | 1.71 | 0.71 | 0.06 | 2.24 | 0.47 | 2.29 | 0.82 | 1.71 | 1.18 | 0.35 | 5.29 | 11.82 | 0.24 | 1.24 | 2.18 | 1.59 | 0.06 | 0.53 | 89.76 | 0.00 | 0.82 | 6.47 | 203.35 | 2.82 | 1.18 | 1.24 | 0.12 | 0.24 | 2.76 | 0.82 | 2.53 | 7.53 | 1.12 | 2.65 | 0.00 | 0.29 | 10.53 | 0.12 | 0.41 | 0.94 |
| 11170 | Gary | Ablett | 14.34 | 10.86 | 25.20 | 68.66 | 1.00 | 0.75 | 0.00 | 4.28 | 1.66 | 4.94 | 3.94 | 4.12 | 0.91 | 1.58 | 10.19 | 15.02 | 0.64 | 0.23 | 0.58 | 0.69 | 0.36 | 0.73 | 85.89 | 1.50 | 2.42 | 7.11 | 417.39 | 5.08 | 2.23 | 0.77 | 0.08 | 0.14 | 0.77 | 0.28 | 5.88 | 8.80 | 1.84 | 7.81 | 0.00 | 0.12 | 15.66 | 0.02 | 1.39 | 0.20 |
| 11192 | Shaun | Burgoyne | 9.18 | 8.07 | 17.25 | 76.87 | 0.60 | 0.41 | 0.00 | 3.69 | 1.53 | 2.79 | 2.37 | 2.15 | 0.78 | 1.01 | 7.44 | 9.51 | 0.07 | 0.32 | 0.51 | 2.24 | 0.25 | 0.59 | 77.19 | 0.90 | 1.47 | 4.91 | 239.76 | 2.62 | 3.07 | 0.59 | 0.19 | 0.69 | 0.79 | 0.22 | 6.78 | 6.29 | 0.99 | 4.93 | 0.00 | 0.46 | 15.21 | 0.00 | 1.18 | 1.06 |
Additional Fields
Number of games
A consideration for this cohort is the presence of outliers that are caused by players who have only taken part in a small number of games. If a player had 3 exceptional games before getting injured/retiring, their average performance may not be especially representative of what would be expected over a longer period of time
Of course, it’s nice to keep these players in the dataset. By deriving a variable of the # of games played, users can set this parameter to include/exclude players as they see appropriate. What constitutes a representative sample of games per player is ultimately subjective, but I have set the default for the app at 22 games - equivalent to one season of football, excluding finals
Calculating the number of games requires aggregation of player_id then a join back to the player-statistics table
Player Team
The team of the player is included in the original dataset that has individual match-stats
However players routinely change teams as part of trades & free-agency. This complicates things as some players will have played in multiple teams. As such, we are specifically interested in a players most recent team
This can be elucidated by ordering the original dataset from most recent to least recent game, then filtering to keep only a distinct record of the player_id & player_team from their most recent match
Player Position
Player position is a useful filter to narrow the data to only players for which a statistic is relevant. For example, a defender would be expected to perform well on spoils, but not score involvements. As such, the option to limit the data to defenders would be welcome
By extracting out the unique elements of the player_position field …
position <- afl.1 %>%
select(player_id,
player_position)
unique_positions <- unique(position$player_position)
unique_positions## [1] "INT" "FF" "FPL" "R" "BPR" "C" "WL" "FPR" "RK" "HFFR"
## [11] "CHB" "FB" "HFFL" "BPL" "CHF" "HBFL" "WR" "RR" "HBFR"
We observe 19 distinct positions, corresponding to the specific position of the selected team for each match … i.e. left forward pocket & right half back flank. For the present analysis, this level of positional-granularity isn’t required. We’re more interested in broadly distinguishing …
- Forwards
- Midfielders
- Defenders
- Rucks
Let’s bucket the positions into these broader categories
# Bucketed positions
# Midfielders
position$PositionType[
position$player_position == "C" |
position$player_position == "RR" |
position$player_position == "R" |
position$player_position == "WL" |
position$player_position == "WR"] <-
"Midfield"
#Defenders
position$PositionType[
position$player_position == "BPL" |
position$player_position == "FB" |
position$player_position == "BPR" |
position$player_position == "HBFL" |
position$player_position == "HBFR" |
position$player_position == "CHB"] <-
"Defender"
#Forwards
position$PositionType[
position$player_position == "FPR" |
position$player_position == "FF" |
position$player_position == "HFFL" |
position$player_position == "FPL" |
position$player_position == "HFFR" |
position$player_position == "CHF"] <-
"Forward"
#Rucks
position$PositionType[
position$player_position == "RK"] <-
"Ruck"
# Interchange
position$PositionType[
position$player_position == "INT"] <-
"Interchange"Creating a new variable Position Type
Player position in this data is captured on a per game basis, meaning some players will accrue multiple positions. For example, Dustin Martin is named as a forward in some games, and on the ball in others
One way to navigate this is to count all the positions each player has been named in, then filter to only the position with the highest count of games for each player (a proxy for their most common position)
# count how often players named in a specific position
position2 <- position %>%
group_by(player_id,
PositionType) %>%
summarise(Position_Count = dplyr::n()) # count of positions per player
# Order by count of positions, for each player
position2 <- position2[with(position2,
order(player_id,
Position_Count)), ]
# keep only the 'position' with the max count
position2 <- position2 %>%
group_by(player_id) %>%
top_n(1, Position_Count)
# For rare cases where two positions have an equal game count, keep only one position
position2 <- position2 %>% distinct(player_id, .keep_all = TRUE)
# remove position count
position2$Position_Count <- NULL
# Join back to dataframe
player <- left_join(player, position2, by = "player_id")This is satisfactory in that every player has now been ascribed a unique position
However, one further issue is that players who started on the bench more than on the ground have been designated interchange - strictly speaking, not a position.
I explored removing interchange entirely before filtering. However, of the ~ 950 total players, 53 had only been named on the interchange. By removing interchange, some players were left without a position
This gap could be remedied be googling, then manually imputing the position of these players. But for all intents & purposes (ergo laziness of the author), a small number of player-positions designated as interchange will not dramatically impact the visualization
As such, our metric is strictly speaking, most commonly named starting position type
Final Tidying
To finish up, I have cleaned up the dataframe (re-order variables, changed column types removed redundant columns)
# remove ID
player$player_id <- NULL
# combine first and last name
player <- player %>%
unite(Player_Name,
c(player_first_name, player_last_name),
remove = TRUE)
# remove underscore
player$Player_Name <- gsub("_", " ", player$Player_Name, fixed = TRUE)
# Re-order Columns
player <- player[,c(3, 1, 2, 48, 4:47)]
# team as factor
player$player_team <- as.factor(player$player_team)
# position as factor
player$PositionType <- as.factor(player$PositionType)
# games as numeric
player$Games <- as.numeric(player$Games)Resulting in the below final dataset:
| Player_Name | player_team | Games | PositionType | kicks | handballs | disposals | disposal_efficiency_percentage | goals | behinds | hitouts | tackles | rebounds | inside_fifties | clearances | clangers | free_kicks_for | free_kicks_against | contested_possessions | uncontested_possessions | brownlow_votes | contested_marks | marks_inside_fifty | one_percenters | bounces | goal_assists | time_on_ground_percentage | centre_clearances | stoppage_clearances | score_involvements | metres_gained | turnovers | intercepts | tackles_inside_fifty | contest_def_losses | contest_def_one_on_ones | contest_off_one_on_ones | contest_off_wins | def_half_pressure_acts | effective_kicks | f50_ground_ball_gets | ground_ball_gets | hitouts_to_advantage | intercept_marks | pressure_acts | ruck_contests | score_launches | spoils |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jarrod Witts | Gold Coast | 77 | Ruck | 7.06 | 6.13 | 13.19 | 64.44 | 0.09 | 0.21 | 38.47 | 3.10 | 1.16 | 1.97 | 4.04 | 2.47 | 1.61 | 1.03 | 7.96 | 5.21 | 0.05 | 0.45 | 0.25 | 2.84 | 0.03 | 0.17 | 84.64 | 1.30 | 2.75 | 3.49 | 167.13 | 2.40 | 1.91 | 0.48 | 0.17 | 0.68 | 0.84 | 0.16 | 5.56 | 3.52 | 0.55 | 4.69 | 10.52 | 0.58 | 10.91 | 75.06 | 2.56 | 2.25 |
| Lachie Weller | Gold Coast | 76 | Midfield | 11.62 | 8.08 | 19.70 | 72.45 | 0.30 | 0.26 | 0.00 | 2.59 | 3.38 | 3.03 | 1.83 | 2.54 | 0.70 | 0.61 | 5.70 | 13.05 | 0.04 | 0.11 | 0.07 | 1.39 | 1.00 | 0.22 | 83.71 | 0.42 | 1.42 | 3.37 | 393.43 | 4.36 | 3.55 | 0.28 | 0.20 | 0.66 | 0.08 | 0.00 | 7.96 | 7.58 | 0.47 | 4.43 | 0.00 | 0.49 | 14.00 | 0.00 | 0.92 | 1.01 |
| Ben Ainsworth | Gold Coast | 56 | Forward | 8.32 | 5.95 | 14.27 | 65.95 | 0.71 | 0.68 | 0.00 | 2.46 | 0.57 | 2.77 | 1.21 | 2.43 | 0.84 | 0.66 | 5.55 | 8.84 | 0.00 | 0.29 | 0.68 | 0.73 | 0.21 | 0.45 | 75.27 | 0.32 | 0.89 | 4.07 | 231.70 | 3.14 | 1.57 | 0.79 | 0.04 | 0.05 | 0.91 | 0.16 | 5.73 | 4.84 | 1.07 | 4.20 | 0.00 | 0.11 | 15.64 | 0.00 | 0.57 | 0.32 |
| Noah Anderson | Gold Coast | 15 | Interchange | 8.73 | 7.00 | 15.73 | 61.73 | 0.27 | 0.27 | 0.00 | 1.87 | 0.93 | 2.53 | 2.00 | 2.40 | 0.60 | 0.53 | 6.67 | 9.13 | 0.00 | 0.13 | 0.33 | 1.07 | 0.20 | 0.27 | 75.80 | 0.40 | 1.60 | 3.27 | 248.40 | 3.87 | 2.60 | 0.20 | 0.07 | 0.13 | 0.27 | 0.07 | 6.47 | 4.20 | 0.27 | 5.53 | 0.00 | 0.20 | 12.20 | 0.00 | 0.60 | 0.73 |
| Mitch Robinson | Brisbane Lions | 64 | Midfield | 11.86 | 7.56 | 19.42 | 69.73 | 0.62 | 0.36 | 0.00 | 4.09 | 1.81 | 3.92 | 2.78 | 3.03 | 1.23 | 1.16 | 7.75 | 11.64 | 0.09 | 0.48 | 0.47 | 1.84 | 0.08 | 0.47 | 79.77 | 0.84 | 1.94 | 4.62 | 362.67 | 4.09 | 3.30 | 0.69 | 0.09 | 0.28 | 0.41 | 0.20 | 6.59 | 7.67 | 0.69 | 5.20 | 0.00 | 0.73 | 14.81 | 0.00 | 1.12 | 1.14 |
| Daniel Rich | Brisbane Lions | 75 | Defender | 14.99 | 5.47 | 20.45 | 77.93 | 0.19 | 0.27 | 0.00 | 1.83 | 5.16 | 2.85 | 1.13 | 2.45 | 0.76 | 0.79 | 5.43 | 12.76 | 0.05 | 0.27 | 0.09 | 2.33 | 0.04 | 0.31 | 82.36 | 0.27 | 0.87 | 3.85 | 446.96 | 3.92 | 5.40 | 0.05 | 0.25 | 0.95 | 0.01 | 0.01 | 6.43 | 11.49 | 0.11 | 4.09 | 0.00 | 1.28 | 9.61 | 0.00 | 1.45 | 1.81 |
Deploying the Shiny App
With this data, we can build out an application to create a dynamic, interactive scatterplot
For those not au-fait with R, I have hosted the app on Shiny Apps. As I am using a free-account, this link is subject to bandwidth limits, & may not work!
Alternatively, the full code to deploy this app locally can be found at my GitHub page
Clicking the app-link, or running this script will produce something like the below:
In the left hand panel, there are several parameters that can be changed. Users can select any of the included statistics to compare on the scatterplot
Players can be further filtered by their team, position, or the number of games they have played between 2017 and 2020
Visualisation & Exploration of the data
Although the above Shiny app is the definitive way to explore this data, it can just as easily be replicated manually with ggplot2 & plotly
In the below section, I will showcase some player/metric combinations I found to be interesting
To begin, let’s create a generic ggplot theme for visualization
Plot_Theme2 <-
theme_minimal() + # start with a minimal theme and add what we need
theme(text = element_text(color = "gray20"),
legend.position = "top", # position the legend in the upper left
legend.direction = "horizontal",
legend.justification = 0.1, # anchor point for legend.position.
legend.title = element_text(size = 10, color = "gray20"),
legend.text = element_text(size = 10, color = "gray20"),
axis.text = element_text(face = "italic"),
axis.title.x = element_text(vjust = -1, size = 10), # move title away axis
axis.title.y = element_text(vjust = 2, size = 10), # move away for axis
axis.ticks.y = element_blank(), # remove elements
axis.line = element_line(color = "gray20", size = 0.5),
axis.line.y = element_blank(),
panel.grid.major = element_line(color = "gray80", size = 0.5),
panel.grid.minor = element_line(color = "gray80", size = 0.5),
panel.grid.major.x = element_blank(),
plot.title = element_text(size = 11)
)& create a number of custom text-overlays for specific players, so their data-point can be directly identified on the plot
Buddy <- player %>%
filter(Player_Name %in% c("Lance Franklin"))
McGovern <- player %>%
filter(Player_Name %in% c("Jeremy McGovern"))
SaadMcKenna <- player %>%
filter(Player_Name %in% c("Adam Saad",
"Conor McKenna"))In the spirit of The Flying Bulldozer - let’s introduce three further sobriquets
Jeremy McGovern - “The Roaming Backboard”
Jeremy McGovern is considered by many to be the premium spare defender in the competition, with a remarkable knack for reading the ball in the air. The stats emphatically support this
p1 <- player %>%
filter(Games >= 22) %>%
ggplot(aes(label = Player_Name,
label2 = Games,
color = PositionType))
p1 <- p1 + geom_point(aes(intercept_marks,
contested_marks),
shape = 16,
size = 3,
alpha = 4/10) +
geom_text(data = McGovern,
size = 2,
colour = "#730202",
aes(intercept_marks, contested_marks)) +
scale_colour_manual(values=c("#b80909",
"#0d6cb5",
"#e0842d",
"#31ad28",
"#e3c609")) +
Plot_Theme2
p1 <- ggplotly(p1)
p1This makes Jeremy McGovern a clear outlier.When balls are launched into defensive 50, they are more often than not going to be repelled by the best defensive, contested, aerialist in the business - the roaming backboard
Buddy Franklin - “The Scoring Beast”
Buddy, & his accolades as a player need no introduction. No-one would argue there is a better current-day forward in the competition.
It’s stark just how superior Buddy’s scoring power is to other players
p2 <- player %>%
filter(Games >= 22) %>%
ggplot(aes(label = Player_Name,
label2 = Games,
color = PositionType))
p2 <- p2 + geom_point(aes(goals,
score_involvements),
shape = 16,
size = 3,
alpha = 4/10) +
geom_text(data = Buddy,
size = 2,
colour = "#07406b",
aes(goals, score_involvements)) +
scale_colour_manual(values=c("#b80909",
"#0d6cb5",
"#e0842d",
"#31ad28",
"#e3c609")) +
Plot_Theme2
p2 <- ggplotly(p2)
p2Hopefully we will see an injury free Lance Franklin in 2021
Adam Saad & Conor McKenna - “The Bouncing-Bomber-Brothers”
There are not many more exciting sights in the AFL than a player streaming down the field at pace, thumping the ball into the turf
Over the past four seasons, only 5 players have averaged more than 1.5 running bounces per game, & of these 5, there are two clear outliers. The gentlemen in question are Conor McKenna & Adam Saad - who interestingly are both running defenders for Essendon. For this reason they are ordained the bouncing bomber brothers
p4 <- player %>%
filter(Games >= 22) %>%
ggplot(aes(label = Player_Name,
label2 = Games,
color = PositionType))
p4 <- p4 + geom_point(aes(bounces,
metres_gained),
shape = 16,
size = 3,
alpha = 4/10) +
geom_text(data = SaadMcKenna,
size = 2,
colour = "#730202",
aes(bounces, metres_gained)) +
scale_colour_manual(values=c("#b80909", "#0d6cb5", "#e0842d", "#31ad28", "#e3c609")) +
Plot_Theme2
p4 <- ggplotly(p4)
p4The ‘brothers’ also average ~ 350m gained per match, so its fair to say their fleetness of foot has been essential to Essendon’s territory game
Sadly Conor McKenna has announcement his retirement & return to Ireland. While speculation is mounting Adam Saad may request a trade away from Essendon. The brotherhood is due to end just as soon as it has been anointed
For anyone reading, I will be interested in any other insights to come from this app!